Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
PLoS Comput Biol ; 17(2): e1008720, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33630864

RESUMO

Increased availability of drug response and genomics data for many tumor cell lines has accelerated the development of pan-cancer prediction models of drug response. However, it is unclear how much between-tissue differences in drug response and molecular characteristics may contribute to pan-cancer predictions. Also unknown is whether the performance of pan-cancer models could vary by cancer type. Here, we built a series of pan-cancer models using two datasets containing 346 and 504 cell lines, each with MEK inhibitor (MEKi) response and mRNA expression, point mutation, and copy number variation data, and found that, while the tissue-level drug responses are accurately predicted (between-tissue ρ = 0.88-0.98), only 5 of 10 cancer types showed successful within-tissue prediction performance (within-tissue ρ = 0.11-0.64). Between-tissue differences make substantial contributions to the performance of pan-cancer MEKi response predictions, as exclusion of between-tissue signals leads to a decrease in Spearman's ρ from a range of 0.43-0.62 to 0.30-0.51. In practice, joint analysis of multiple cancer types usually has a larger sample size, hence greater power, than for one cancer type; and we observe that higher accuracy of pan-cancer prediction of MEKi response is almost entirely due to the sample size advantage. Success of pan-cancer prediction reveals how drug response in different cancers may invoke shared regulatory mechanisms despite tissue-specific routes of oncogenesis, yet predictions in different cancer types require flexible incorporation of between-cancer and within-cancer signals. As most datasets in genome sciences contain multiple levels of heterogeneity, careful parsing of group characteristics and within-group, individual variation is essential when making robust inference.


Assuntos
Antineoplásicos/farmacologia , Ensaios de Seleção de Medicamentos Antitumorais , Neoplasias/tratamento farmacológico , Algoritmos , Área Sob a Curva , Linhagem Celular Tumoral , Variações do Número de Cópias de DNA , Inibidores Enzimáticos/farmacologia , Dosagem de Genes , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Genômica , Humanos , MAP Quinase Quinase 1/antagonistas & inibidores , Aprendizado de Máquina , Mutação Puntual , Polimorfismo de Nucleotídeo Único , RNA/genética , RNA/metabolismo , RNA Mensageiro/metabolismo , Análise de Regressão
2.
BMC Genomics ; 21(1): 159, 2020 Feb 13.
Artigo em Inglês | MEDLINE | ID: mdl-32054475

RESUMO

BACKGROUND: Gene expression is regulated by DNA-binding transcription factors (TFs). Together with their target genes, these factors and their interactions collectively form a gene regulatory network (GRN), which is responsible for producing patterns of transcription, including cyclical processes such as genome replication and cell division. However, identifying how this network regulates the timing of these patterns, including important interactions and regulatory motifs, remains a challenging task. RESULTS: We employed four in vivo and in vitro regulatory data sets to investigate the regulatory basis of expression timing and phase-specific patterns cell-cycle expression in Saccharomyces cerevisiae. Specifically, we considered interactions based on direct binding between TF and target gene, indirect effects of TF deletion on gene expression, and computational inference. We found that the source of regulatory information significantly impacts the accuracy and completeness of recovering known cell-cycle expressed genes. The best approach involved combining TF-target and TF-TF interactions features from multiple datasets in a single model. In addition, TFs important to multiple phases of cell-cycle expression also have the greatest impact on individual phases. Important TFs regulating a cell-cycle phase also tend to form modules in the GRN, including two sub-modules composed entirely of unannotated cell-cycle regulators (STE12-TEC1 and RAP1-HAP1-MSN4). CONCLUSION: Our findings illustrate the importance of integrating both multiple omics data and regulatory motifs in order to understand the significance regulatory interactions involved in timing gene expression. This integrated approached allowed us to recover both known cell-cycles interactions and the overall pattern of phase-specific expression across the cell-cycle better than any single data set. Likewise, by looking at regulatory motifs in the form of TF-TF interactions, we identified sets of TFs whose co-regulation of target genes was important for cell-cycle expression, even when regulation by individual TFs was not. Overall, this demonstrates the power of integrating multiple data sets and models of interaction in order to understand the regulatory basis of established biological processes and their associated gene regulatory networks.


Assuntos
Regulação Fúngica da Expressão Gênica , Redes Reguladoras de Genes , Genes cdc , Genômica , Saccharomyces cerevisiae/genética , Biologia Computacional/métodos , Genômica/métodos , Aprendizado de Máquina , Ligação Proteica , Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Fatores de Transcrição/metabolismo
3.
NAR Genom Bioinform ; 2(3): lqaa049, 2020 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-33575601

RESUMO

Plants respond to their environment by dynamically modulating gene expression. A powerful approach for understanding how these responses are regulated is to integrate information about cis-regulatory elements (CREs) into models called cis-regulatory codes. Transcriptional response to combined stress is typically not the sum of the responses to the individual stresses. However, cis-regulatory codes underlying combined stress response have not been established. Here we modeled transcriptional response to single and combined heat and drought stress in Arabidopsis thaliana. We grouped genes by their pattern of response (independent, antagonistic and synergistic) and trained machine learning models to predict their response using putative CREs (pCREs) as features (median F-measure = 0.64). We then developed a deep learning approach to integrate additional omics information (sequence conservation, chromatin accessibility and histone modification) into our models, improving performance by 6.2%. While pCREs important for predicting independent and antagonistic responses tended to resemble binding motifs of transcription factors associated with heat and/or drought stress, important synergistic pCREs resembled binding motifs of transcription factors not known to be associated with stress. These findings demonstrate how in silico approaches can improve our understanding of the complex codes regulating response to combined stress and help us identify prime targets for future characterization.

4.
Breast Cancer Res Treat ; 179(2): 337-347, 2020 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-31655920

RESUMO

PURPOSE: There is a need for biomarkers of drug efficacy for targeted therapies in triple-negative breast cancer (TNBC). As a step toward this, we identify multi-omic molecular determinants of anti-TNBC efficacy in cell lines for a panel of oncology drugs. METHODS: Using 23 TNBC cell lines, drug sensitivity scores (DSS3) were determined using a panel of investigational drugs and drugs approved for other indications. Molecular readouts were generated for each cell line using RNA sequencing, RNA targeted panels, DNA sequencing, and functional proteomics. DSS3 values were correlated with molecular readouts using a FDR-corrected significance cutoff of p* < 0.05 and yielded molecular determinant panels that predict anti-TNBC efficacy. RESULTS: Six molecular determinant panels were obtained from 12 drugs we prioritized based on their efficacy. Determinant panels were largely devoid of DNA mutations of the targeted pathway. Molecular determinants were obtained by correlating DSS3 with molecular readouts. We found that co-inhibiting molecular correlate pathways leads to robust synergy across many cell lines. CONCLUSIONS: These findings demonstrate an integrated method to identify biomarkers of drug efficacy in TNBC where DNA predictions correlate poorly with drug response. Our work outlines a framework for the identification of novel molecular determinants and optimal companion drugs for combination therapy based on these correlates.


Assuntos
Antineoplásicos/farmacologia , Resistencia a Medicamentos Antineoplásicos , Neoplasias de Mama Triplo Negativas/tratamento farmacológico , Neoplasias de Mama Triplo Negativas/etiologia , Antineoplásicos/uso terapêutico , Protocolos de Quimioterapia Combinada Antineoplásica/efeitos adversos , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapêutico , Linhagem Celular Tumoral , Biologia Computacional/métodos , Relação Dose-Resposta a Droga , Resistencia a Medicamentos Antineoplásicos/genética , Ensaios de Seleção de Medicamentos Antitumorais , Feminino , Perfilação da Expressão Gênica , Humanos , Mutação , Proteômica , Resultado do Tratamento , Neoplasias de Mama Triplo Negativas/metabolismo
5.
Sci Rep ; 9(1): 12122, 2019 08 20.
Artigo em Inglês | MEDLINE | ID: mdl-31431676

RESUMO

Extensive transcriptional activity occurring in intergenic regions of genomes has raised the question whether intergenic transcription represents the activity of novel genes or noisy expression. To address this, we evaluated cross-species and post-duplication sequence and expression conservation of intergenic transcribed regions (ITRs) in four Poaceae species. Among 43,301 ITRs across the four species, 34,460 (80%) are species-specific. ITRs found across species tend to be more divergent in expression and have more recent duplicates compared to annotated genes. To assess if ITRs are functional (under selection), machine learning models were established in Oryza sativa (rice) that could accurately distinguish between phenotype genes and pseudogenes (area under curve-receiver operating characteristic = 0.94). Based on the models, 584 (8%) and 4391 (61%) rice ITRs are classified as likely functional and nonfunctional with high confidence, respectively. ITRs with conserved expression and ancient retained duplicates, features that were not part of the model, are frequently classified as likely-functional, suggesting these characteristics could serve as pragmatic rules of thumb for identifying candidate sequences likely to be under selection. This study also provides a framework to identify novel genes using comparative transcriptomic data to improve genome annotation that is fundamental for connecting genotype to phenotype in crop and model systems.


Assuntos
DNA Intergênico , Genes de Plantas , Poaceae/genética , Transcrição Gênica , Evolução Biológica , Genoma de Planta , Aprendizado de Máquina , Modelos Genéticos , Fenótipo , Pseudogenes , Especificidade da Espécie
6.
Proc Natl Acad Sci U S A ; 116(6): 2344-2353, 2019 02 05.
Artigo em Inglês | MEDLINE | ID: mdl-30674669

RESUMO

Plant specialized metabolism (SM) enzymes produce lineage-specific metabolites with important ecological, evolutionary, and biotechnological implications. Using Arabidopsis thaliana as a model, we identified distinguishing characteristics of SM and GM (general metabolism, traditionally referred to as primary metabolism) genes through a detailed study of features including duplication pattern, sequence conservation, transcription, protein domain content, and gene network properties. Analysis of multiple sets of benchmark genes revealed that SM genes tend to be tandemly duplicated, coexpressed with their paralogs, narrowly expressed at lower levels, less conserved, and less well connected in gene networks relative to GM genes. Although the values of each of these features significantly differed between SM and GM genes, any single feature was ineffective at predicting SM from GM genes. Using machine learning methods to integrate all features, a prediction model was established with a true positive rate of 87% and a true negative rate of 71%. In addition, 86% of known SM genes not used to create the machine learning model were predicted. We also demonstrated that the model could be further improved when we distinguished between SM, GM, and junction genes responsible for reactions shared by SM and GM pathways, indicating that topological considerations may further improve the SM prediction model. Application of the prediction model led to the identification of 1,220 A. thaliana genes with previously unknown functions, each assigned a confidence measure called an SM score, providing a global estimate of SM gene content in a plant genome.

7.
Mol Biol Evol ; 35(6): 1422-1436, 2018 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-29554332

RESUMO

With advances in transcript profiling, the presence of transcriptional activities in intergenic regions has been well established. However, whether intergenic expression reflects transcriptional noise or activity of novel genes remains unclear. We identified intergenic transcribed regions (ITRs) in 15 diverse flowering plant species and found that the amount of intergenic expression correlates with genome size, a pattern that could be expected if intergenic expression is largely nonfunctional. To further assess the functionality of ITRs, we first built machine learning models using Arabidopsis thaliana as a model that accurately distinguish functional sequences (benchmark protein-coding and RNA genes) and likely nonfunctional ones (pseudogenes and unexpressed intergenic regions) by integrating 93 biochemical, evolutionary, and sequence-structure features. Next, by applying the models genome-wide, we found that 4,427 ITRs (38%) and 796 annotated ncRNAs (44%) had features significantly similar to benchmark protein-coding or RNA genes and thus were likely parts of functional genes. Approximately 60% of ITRs and ncRNAs were more similar to nonfunctional sequences and were likely transcriptional noise. The predictive framework established here provides not only a comprehensive look at how functional, genic sequences are distinct from likely nonfunctional ones, but also a new way to differentiate novel genes from genomic regions with noisy transcriptional activities.


Assuntos
DNA Intergênico , Tamanho do Genoma , Genoma de Planta , Modelos Genéticos , RNA não Traduzido , Metilação de DNA , Aprendizado de Máquina , Magnoliopsida , Fenótipo , Transcrição Gênica
8.
Mol Biol Evol ; 34(7): 1788-1798, 2017 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-28398576

RESUMO

The human genome is dominated by large tracts of DNA with extensive biochemical activity but no known function. In particular, it is well established that transcriptional activities are not restricted to known genes. However, whether this intergenic transcription represents activity with functional significance or noise is under debate, highlighting the need for an effective method of defining functional genomic regions. Moreover, these discoveries raise the question whether genomic regions can be defined as functional based solely on the presence of biochemical activities, without considering evolutionary (conservation) and genetic (effects of mutations) evidence. Here, computational models integrating genetic, evolutionary, and biochemical evidence are established that provide reliable predictions of human protein-coding and RNA genes. Importantly, in addition to sequence conservation, biochemical features allow accurate predictions of genic sequences with phenotypic evidence under strong purifying selection, suggesting that they can be used as an alternative measure of selection. Moreover, 18.5% of annotated noncoding RNAs exhibit higher degrees of similarity to phenotype genes and, thus, are likely functional. However, 64.5% of noncoding RNAs appear to belong to a sequence class of their own, and the remaining 17% are more similar to pseudogenes and random intergenic sequences that may represent noisy transcription.


Assuntos
Biologia Computacional/métodos , DNA Intergênico/genética , Análise de Sequência de DNA/métodos , Animais , Evolução Biológica , Simulação por Computador , Sequência Conservada/genética , Evolução Molecular , Genoma Humano , Genômica/métodos , Humanos , Pseudogenes/genética , RNA , RNA não Traduzido , Seleção Genética , Transcrição Gênica
9.
Plant Cell ; 27(8): 2133-47, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26286535

RESUMO

Essential genes represent critical cellular components whose disruption results in lethality. Characteristics shared among essential genes have been uncovered in fungal and metazoan model systems. However, features associated with plant essential genes are largely unknown and the full set of essential genes remains to be discovered in any plant species. Here, we show that essential genes in Arabidopsis thaliana have distinct features useful for constructing within- and cross-species prediction models. Essential genes in A. thaliana are often single copy or derived from older duplications, highly and broadly expressed, slow evolving, and highly connected within molecular networks compared with genes with nonlethal mutant phenotypes. These gene features allowed the application of machine learning methods that predicted known lethal genes as well as an additional 1970 likely essential genes without documented phenotypes. Prediction models from A. thaliana could also be applied to predict Oryza sativa and Saccharomyces cerevisiae essential genes. Importantly, successful predictions drew upon many features, while any single feature was not sufficient. Our findings show that essential genes can be distinguished from genes with nonlethal phenotypes using features that are similar across kingdoms and indicate the possibility for translational application of our approach to species without extensive functional genomic and phenomic resources.


Assuntos
Arabidopsis/genética , Genes Letais/genética , Genes de Plantas/genética , Mutação , Evolução Molecular , Dosagem de Genes , Regulação da Expressão Gênica de Plantas , Ontologia Genética , Genes Essenciais/genética , Oryza/genética , Fenótipo , Saccharomyces cerevisiae , Especificidade da Espécie , Máquina de Vetores de Suporte
10.
Plant Methods ; 11: 10, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25774204

RESUMO

BACKGROUND: Plant phenotype datasets include many different types of data, formats, and terms from specialized vocabularies. Because these datasets were designed for different audiences, they frequently contain language and details tailored to investigators with different research objectives and backgrounds. Although phenotype comparisons across datasets have long been possible on a small scale, comprehensive queries and analyses that span a broad set of reference species, research disciplines, and knowledge domains continue to be severely limited by the absence of a common semantic framework. RESULTS: We developed a workflow to curate and standardize existing phenotype datasets for six plant species, encompassing both model species and crop plants with established genetic resources. Our effort focused on mutant phenotypes associated with genes of known sequence in Arabidopsis thaliana (L.) Heynh. (Arabidopsis), Zea mays L. subsp. mays (maize), Medicago truncatula Gaertn. (barrel medic or Medicago), Oryza sativa L. (rice), Glycine max (L.) Merr. (soybean), and Solanum lycopersicum L. (tomato). We applied the same ontologies, annotation standards, formats, and best practices across all six species, thereby ensuring that the shared dataset could be used for cross-species querying and semantic similarity analyses. Curated phenotypes were first converted into a common format using taxonomically broad ontologies such as the Plant Ontology, Gene Ontology, and Phenotype and Trait Ontology. We then compared ontology-based phenotypic descriptions with an existing classification system for plant phenotypes and evaluated our semantic similarity dataset for its ability to enhance predictions of gene families, protein functions, and shared metabolic pathways that underlie informative plant phenotypes. CONCLUSIONS: The use of ontologies, annotation standards, shared formats, and best practices for cross-taxon phenotype data analyses represents a novel approach to plant phenomics that enhances the utility of model genetic organisms and can be readily applied to species with fewer genetic resources and less well-characterized genomes. In addition, these tools should enhance future efforts to explore the relationships among phenotypic similarity, gene function, and sequence similarity in plants, and to make genotype-to-phenotype predictions relevant to plant biology, crop improvement, and potentially even human health.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...